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CALIBRATION  OF  ARMED  SERVICES  VOCATIONAL  APTITUDE 
BATTERY  FORMS  8,  9,  AND  10 

I.  INTRODUCTION 

The  measurement  of  human  characteristics  has  been  a  necessary  part  of  selection  and  classification  for  mili¬ 
tary  occupations  for  over  60  years.  Like  measurement  of  physical  characteristics,  such  as  length,  weight,  or  density, 
no  natural  units  of  measure  exist  for  psychological  characteristics;  rather,  artificial  units  are  established  by  consen¬ 
sus.  One  of  the  most  frequently  used  units  of  measurement  for  human  characteristics  is  the  percentile  equivalent. 
The  percentile  is  reported  in  reference  to  some  standard  population  or  group.  Ability  tests  used  for  military  selec¬ 
tion  and  classification  are  usually  referenced  to  the  1944  mobilisation  base,  and  this  is  usually  accomplished  by 
equating  new  tests  to  old  tests.  Equating  is  the  conversion  of  score  units  of  one  test  to  the  score  units  of  another  test. 
The  current  study  describes  the  referencing  of  Forms  8a,  8b,  9a,  9b,  10a,  and  10b  of  the  Armed  Services  Voca¬ 
tional  Aptitude  Battery  (ASVAB)  to  the  mobilisation  base  metric,  through  the  use  of  an  anchor  teat 

There  are  two  important  reasons  why  current  tests  are  equated  to  past  tests.  The  first  is  to  enable  the  testing 
agency  to  report  on  the  relative  distribution  of  scores  on  a  year-to-year  basis  in  a  common  metric.  For  example,  the 
various  military  services  like  to  be  able  to  compare  current  accessions  to  past  accessions  on  the  same  scale.  The 
second  reason  is  to  provide  a  consistent  meaning  for  cutting  acmes  for  selection  and  classification  tests.  In  theory,  a 
score  for  the  new  test  at  the  80th  percentile  can  be  said  to  be  equivalent  to  a  score  at  the  80th  percentile  on  the  past 
tests,  and  this  equivalence  becomes  the  definition  of  consistency. 

When  several  forms  of  a  test  are  to  be  operational  simultaneously,  it  is  an  advantage  if  they  are  parallel,  which 
allows  the  use  of  a  single  equating  table.  Gulliksen  (1950)  offers  a  definition  of  parallel  tests  which  indudea  same¬ 
ness  of  factor  structure,  equality  of  means,  equality  of  variances,  and  equality  of  non-sero  correlations  with  an 
external  criterion.  It  also  seems  reasonable  to  indude  equivalence  of  skew  and  kuitosis  (Ree,  1977),  the  third  and 
fourth  moments  of  the  distribution,  although  little  research  exists  in  the  area. 

Parallel  tests  may  be  constructed  by  assigning  items  randomly  to  forms.  This  method  is  usually  called  “Ran¬ 
domly  Parallel  Forms.”  Or  items  may  be  matched  on  difficulty  and/ <*r  discrimination,  stratified,  and  then  assigned 
randomly  to  one  of  a  set  of  multiple  forms.  This  procedure  is  called  “Stratified  Parallel.”  Analytic  methods  of  con¬ 
structing  parallel  forms  also  exist  (Ree,  1976),  but  they  tend  to  he  intensive  of  computer  time. 

Using. the  Stratified  Parallel  method.  Forms  8,  9,  and  10  of  the  ASVAB  were  constructed  to  be  parallel  in 
terms  of  raw  semes  so  that  a  single  table  might  be  used  to  convert  raw  scores  on  any  of  the  six  forms  to  percentile 
equivalents.  The  objective  of  this  study  was  to  determine  if  a  single  table  were  appropriate. 

Calibration  of  Tests 

Because  two  or  more  forms  of  a  test  can  never  be  made  precisely  equivalent  in  range  and  level,  it  is  necessary 
to  render  the  forms  interchangeable  by  equating.  The  equating  procedure  may  be  defined  (Flanagan,  1951; 
Angoff,  1971)  as  converting  the  scoring  units  of  one  test  to  the  sooring  units  of  another. 

In  general,  two  procedures  have  been  in  common  use:  linear  and  equiperoentile  equating.  Linear  equating 
requires  that  equivalent  Z-eoore  transformations  of  the  two  tests  represent  the  same  cumulative  proportion.  Said 
differently,  the  shapes  of  score  distributions  should  differ  only  trivially.  Equiperoentile  equating,  on  the  other 
hand,  makes  no  such  assumption  of  Z-acore  equivalence.  The  linear  method  offer*  the  advantage  of  dealing  with 
analytic  statistics  (means,  standard  deviations,  etc.)  which  are  verifiable,  Equiperoentile  equating  Is  pnfarahle 
when  the  distributions  differ  and  is  often  offered  as  the  definition  of  equating  (Jaeger,  1981).  It  shenld  be  naiad 
that  the  linear  and  equiperoentile  approach  coincide  when  both  the  distributions  to  which  they  me  applied  hero 
the  same  shape. 
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Angoff  (1971)  uses  the  term  “calibration'’  to  describe  the  equating  of  tests  of  differing  abilities.  For  example, 
the  equating  of  a  test  of  Word  Knowledge  to  a  test  of  Reading  would  be  called  “calibration.”  Therefore,  it  is 
appropriate  to  say  that  military  selection  and  classification  tests  have  been  calibrated  rather  than  equated.  Angoff  is 
somewhat  critical  of  the  calibration  technique  because  a  problem  arises  from  the  nature  of  calibration.  It  is 
repeatedly  stated  in  the  literature  (Angoff,  1971;  Flanagan,  1951;  Jaeger,  1981)  that  calibrating  does  not  lead  to 
sample-unique  solutions,  as  does  equating,  although  empirical  evidence  is  not  offered.  The  non-uniqueness  of  the 
solution  makes  difficult  the  interpretation  of  several  calibrations  of  the  same  test,  or  parallel  forms  of  the  test. 
Military  selection  and  classification  tests  have  frequently  been  calibrated,  rather  than  equated.  Form  8a  of  the 
ASVAB  was  linked  via  calibration  to  an  anchor  test  using  several  differing  subject  groups  ranging  from  high  school 
students  to  new  military  recruits.  The  effects  of  calibrating,  as  opposed  to  equating,  require  further  study  in  order 
to  understand  fully  the  consequences  of  the  technique. 

Three  previous  studies  (Boldt,  1980;  Maier  &  Grafton,  1981;  Sims  &  Truss,  1980)  were  conducted  which 
calibrated  Form  8a  to  Armed  Forces  Qualification  Test  Form  7a  (AFQT-7a).  Because  ASVAB  Forms  8,  9,  and  10 
were  constructed  to  be  parallel  by  the  method  described  previously  as  “Stratified  Parallel  Forms,”  it  was  reasoned 
that  calibrating  one  form  was  tantamount  to  calibrating  all  forms.  That  is,  because  calibration  sets  raw  scores  of  the 
calibrated  test  equivalent  to  raw  scores  on  an  anchor  or  target  test,  and  because  the  raw  scores  of  the  six  forms  were 
constructed  to  be  equivalent,  then  any  one  form  may  be  calibrated,  and  the  results  should  then  be  applicable  to  all 
the  other  forms.  The  crucial  requirement  is  that  the  forms  be  parallel.  If  they  are  not,  separate  calibrations  are 
required.  The  present  study  seeks  to  verify  the  results  of  the  earlier  calibration  studies  which  produced  the  tables 
implemented  1  October  1980.  These  are  referred  to  as  the  operational  tables. 

In  order  to  determine  if  the  assumptions  underlying  the  procedures  for  calibrating  ASVAB-8a  and  thereby 
Forms  8b,  9a,  9b,  10a,  and  10b  were  acceptable,  an  Initial  Operational  Test  and  Evaluation  (IOT&E)  was 
undertaken.  The  IOT&E  was  begun  shortly  after  the  test  was  put  into  operation  for  selection  and  classification  of 
candidates  for  military  enlistment. 


0.  METHOD 


The  Tests 

Forms  8,  9,  and  10  of  the  ASVAB  are  multiple  aptitude  batteries  comprised  of  10  subtests.  Eight  of  the 
subtests  are  power  subtests,  while  two  are  speeded  subtests.  Table  1  shows  the  name,  the  number  of  items,  and 
whether  the  subtest  is  power  or  speeded.  These  forms  differ  from  the  previous  ASVAB  forms  by  the  inclusion  of 
Paragraph  Comprehension  (PC)  and  Coding  Speed  (CS)  subtests,  by  the  combination  of  Automotive  Information 
and  Shop  Information  into  a  single  subtest  (AS),  and  by  the  deletion  of  subtests  measuring  Space  Perception, 
Attention  to  Detail,  and  General  Information.  The  overall  administration  time  for  any  of  the  forms  is  about  180 
minutes,  and  in  operation,  the  test  is  answered  on  a  machine  scannable  answer  sheet. 


Table  I.  Name  and  Number  of  Items  for  Power  and  Speeded  ASVAB 
Subtests  in  Forms  8,  9,  and  10 


Name 

Number  of  Item* 

Power/Speed 

General  Science  (GS) 

25 

Power 

Word  Knowledge  (WK) 

35 

Power 

Arithmetic  Reasoning  (AR) 

30 

Power 

Paragraph  Comprehension  (PC) 

15 

Power 

Numerical  Operations  (NO) 

50 

Speed 

Coding  Speed  (CS) 

84 

Speed 

Ante  Shop  Information  (AS) 

25 

Power 

Mathematics  Knowledge  (MK) 

25 

Power 

Mechanical  Comprehension  (MC) 

25 

Power 

Electronics  Information  (El) 

20 

Power 
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The  Armed  Forces  Qualification  Test  (AFQT)  composite  is  used  for  military  enlistment  qualification  and  is 
comprised  of  PC,  Word  Knowledge  (WK),  Arithmetic  Reasoning  (AR).  and  Numerical  Operations  (NO)  subtests. 
All  subtests  are  unit  weighted  except  for  NO,  which  is  weighted  by  one-half. 

The  AFQT-7a  served  as  the  anchor  test.  This  test  was  previously  used  for  enlistment  qualification  but  has 
been  inactive  for  several  years.  It  was  chosen  as  the  anchor  test  because  its  content  is  close  to  that  of  the  test  used  in 
the  1944  mobilization  base  development  testing.  It  is  not  believed  to  be  compromised,  and  an  earlier  form  (Form  3) 
of  the  ASVAB  was  calibrated  against  it. 

The  AFQT-7a  has  100  items  evenly  distributed  in  the  ability  areas  of  WK.  AR.  Boxes  (B).  and  Tool 
Knowledge  (TK).  The  first  two.  WK  and  AR,  are  similar  to  the  like-named  subtests  in  the  current  AFQT  portion  of 
the  ASVAB.  The  latter  two,  B  and  TK.  are  not  found  in  the  current  AFQT  portions  of  the  ASVAB.  It  is  the  disparity 
in  the  abilitv  areas  measured  which  leads  to  labeling  the  equating  effort  a  "calibration"  and  which  leads  to  the 
problem  of  non-unique  solutions. 


Administration  of  Tests  to  Subjects 

A  sample  of  subjects  was  drawn  to  provide  for  equal  geographical  representation.  Data  collection  took  place  in 
20  Armed  Forces  Examining  and  Entrance  Stations  (AFEESs).  Table  2  shows  the  locations  of  the  AFEESs  and  the 
number  of  subjects  at  each.  Each  subject  took  the  AFQT-7a  and  one  form  of  the  ASVAB.  which  was  used  for 
qualification  for  military  enlistment.  The  AFQT-7a  was  administered  on  a  separate  answer  sheet.  The  ASVAB  and 
AFQT-7a  tests  were  administered  in  counterbalanced  order  by  reversing  order  of  their  administration  each  day 
from  that  employed  the  previous  day.  Tests  were  also  administered  at  locations  affiliated  with  the  AFEES,  railed 
Mobile  Examining  Team  (MET)  sites  and  Office  of  Personnel  Management  (OPM)  sites. 


Table  2.  AFEES  Sites  and  Sample  at  Sites* 


AFEES 

Subject* 

Chicago 

1,500 

Cleveland 

1,300 

Atlanta 

800 

Baltimore 

1,600 

Boston 

1,300 

Jacksonville 

1,400 

Los  Angeles 

2.600 

Montgomery 

900 

Newark 

1,400 

Philadelphia 

1,400 

Richmond 

1,200 

St.  Louis 

1,400 

Spokane 

500 

Denver 

600 

Houston 

600 

Phoenix 

500 

Portland 

400 

San  Diego 

600 

Minneapolis 

1,200 

Omaha 

1,200 

Total 

22,400 

*Site«  included  AFEES.  MET.  snd  OPM  locetioni  for  leet  adminutrstion. 
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Data  Editing 


All  answer  sheets  were  visually  inspected  for  completeness  of  information  and  stray  marks.  The  ASVAB  uses  a 
three-part  answer  sheet  which  is  optically  scannable  and  has  precoded  numbers  on  each  sheet  to  keep  the  triplet  set 
intact  during  operational  scanning.  There  is  also  an  optically  scannable  social  security  account  number  (SSAN) 
grid.  These  operational  ASVAB  answer  sheets  which  had  been  scanned  at  AFEES  were  then  rescanned  and  the 
required  triplets  of  answer  sheets  were  merged.  The  AFQT-7a  answer  sheets  were  also  scanned  and  merged  with 
the  records  of  the  ASVAB  for  each  subject.  Because  only  males  were  represented  in  the  World  War  II  (1944) 
mobilization  base,  female  subjects  were  deleted  from  the  original  sample  to  leave  a  “males  only"  sample  of 
applicants. 

Three  other  editing  procedures  were  employed.  First,  to  determine  if  the  correct  form  of  the  test  (8a  to  10b) 
was  specified  on  the  answer  sheet,  a  check  was  performed  by  scoring  the  first  four  items  in  the  NO,  CS,  and  WK 
subtests.  Twelve  items  in  all  were  scored.  The  NO  and  CS  are  speeded  subtests,  and  the  WK  subtest  has  the  easiest 
items  first.  It  was  reasoned  that  any  examinee's  score  of  6  or  less  was  suspect  and  should  be  examined  further.  This 
was  accomplished  by  applying  each  of  the  six  form-specific  scoring  keys  for  these  12  items  to  the  answer  sheet  and 
comparing  the  magnitude  of  the  scores  from  the  various  key  sets.  For  example,  if  the  subject  coded  “Form  8a"  on 
the  answer  sheet  and  obtained  a  score  of  2  from  the  Form  8a  key,  but  when  scored  on  the  Form  10b  key  obtained  a 
score  of  11,  then  the  entire  test  was  scored  using  the  10b  scoring  key.  If,  on  the  other  hand,  low  scores  were  found 
for  all  forms,  then  the  key  for  the  form  indicated  by  the  examinee  was  retained. 

The  second  data  editing  procedure  was  designed  to  see  if  differences  existed  among  types  of  testing  sites: 
AFEES,  MET,  and  OPM.  This  was  accomplished  by  inspecting  the  mean  and  standard  deviation  of  the  absolute 
differences,  by  type  of  test  site,  between  the  scores  on  the  AFQT-7a  and  the  AFQT  portion  of  the  six  forms  of  the 
ASVAB.  Systematic  deviance  in  a  type  of  testing  site  would  indicate  that  data  from  that  kind  of  site  should  be 
discarded. 

The  third  and  final  check  was  to  investigate  the  bivariate  scatter  plots  and  standardized  residuals  devolved 
from  regressing  scores  for  each  ASVAB-AFQT  on  scores  on  AFQT-7a.  scores  on  each  AR  on  Math  Knowledge 
(MK),  and  scores  on  each  NO  on  CS.  These  three  sets  of  variables  allow  investigation  of  consistency  of  responding 
between  the  first  and  second  halves  of  the  ASVAB  for  both  power  and  speeded  tests  as  well  as  between  a  test 
actually  used  for  military  enlistment  qualification  (ASVAB)  and  a  test  (AFQT)  given  for  equating  purposes  only. 
Each  pair  of  variables  is  highly  correlated.  Examinees  with  standardized  residuals  outside  of  the  range  of  +  2.50 
were  identified  for  further  scrutiny.  They  were  located  on  the  appropriate  scatter  plot  and  were  deleted  if  it  was 
reasonably  clear  from  visual  inspection  that  they  represented  true  outliers  by  being  substantially  away  from  the 
bulk  of  the  scatter. 


Sample 

From  the  original  sample  collected  at  the  AFEES,  MET,  and  OPM  sites,  females  and  those  who  failed  the  data 
editing  were  removed.  Six  male-only  samples  were  created  based  on  the  form  of  ASVAB  administered.  Random 
half-samples  were  selected  within  each  of  the  six  male-only  samples  created  for  Forms  8a  through  10b.  These  half¬ 
samples  were  established  in  order  to  cross-validate  results  and  to  investigate  consistency  of  various  estimates  made 
in  the  equating  process. 


Equipercentile  Equating  and  Calibrating 

It  is  appropriate  to  specify  that  Forms  8,  9,  and  10  of  ASVAB  were  calibrated  using  AFQT-7a  as  a  standard. 
The  plan  identified  as  '‘Design  If"  by  Angoff  (1971)  was  used  for  each  pair  of  composites  to  be  calibrated. 

Test  calibration  was  accomplished  using  raw  scores  on  the  ASVAB-AFQT  and  on  the  AFQT-7a  as  a  starting 
point.  For  each  raw  score  distribution  of  ASVAB-AFQT  and  AFQT-7a,  sample  dependent  percentiles  from  1  to  99 
were  computed  in  unit  intervals.  This  is  essentially  a  raw  score  to  raw  score  procedure.  Previous  equatings  using 
ASVAB-AFQT  raw  score  to  AFQT-7a  percentile  equivalents  only,  rather  than  ASVAB-AFQT  raw  score  to  AFQT-7a 
raw  score  were  deemed  insufficient,  as  information  was  lost  when  raw  score  point  intervals  were  collapsed.  The  raw 
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score  to  raw  score  procedure  was  used  because  it  is  more  widely  accepted  and  more  efficient.  After  the  raw  score 
equivalents  were  established,  it  was  necessary  to  smooth  the  resulting  line.  This  smoothing  was  accomplished  by 
using  the  analytic  procedure  of  polynomial  regressions  up  to  the  third  order.  The  fit  of  the  regression  was  used  to 
determine  the  best  curve. 

The  creation  of  half-samples  was  especially  useful  in  determining  the  relative  stability  of  the  quadratic  and 
cubic  regression  weights.  Each  smoothing  was  accomplished  three  times,  and  the  weights  were  retained  only  if  they 
remained  relatively  constant.  The  cases  in  which  higher  order  weights  did  not  remain  constant  were  smoothed  by 
the  first  order  polynomial,  as  it  always  remained  constant.  Table  3  provides  an  instructive  example  using  invented 
data.  The  samples  1  through  3  on  the  left  show  instances  where  the  weights  (W-)  are  stable  and  thus  are  acceptable 
to  smooth  the  equating  line.  The  fourth,  fifth,  and  sixth  samples  show  an  'nstability  of  weights  due  to  capitalization 
on  chance  fluctuation,  which  causes  the  high  order  polynomials  to  be  rejected.  Note  how  the  values  in  the  columns 
marked  “W2”  and  “W3”  fluctuate  in  these  later  samples  but  not  in  samples  1  through  3.  This  kind  of  instability  of 
weights  should  be  the  basis  for  rejection  of  the  polynomial.  Note  also  how  the  standard  error  of  estimate  (SEE) 
decreases  substantially  as  the  higher  order  terms  are  entered  in  samples  I  to  3,  but  not  in  samples  4  through  6.  This 
consistency  and  reduction  of  SEE  is  indicative  of  a  better  fit.  Three  additional  points  are  worthy  of  note.  First,  the 
is  observed  to  change  only  in  the  trivial  third  decimal  place,  and  little  emphasis  should  be  placed  on  it. 
Secondly,  the  standard  error  of  estimate  is  appropriate  for  determining  fit.  Finally,  care  must  be  exercised  not  to 
interpret  the  R  and  R^  as  correlations  between  raw  scores  for  subjects.  These  indexes  reflect  the  covariation  of  the 
equated  percentile  points  in  a  distribution  and  must  be  expected  to  be  quite  high.  One  advantage  of  this  method  of 
smoothing  is  that  it  is  analytic  and  reproducible,  thereby  avoiding  the  myriad  pitfalls  of  hand  smoothing. 


Table  3.  Example  of  Smoothing  by  Polynomial 


Sample 

Type 

W1 

W2 

W3 

SEE 

i 

Full 

.9987 

Composite  1 
1.045 

2.618 

i 

.9999 

.981 

.056 

1.072 

i 

.9999 

.970 

.049 

.051 

.674 

2 

Half 

.9985 

1.050 

3.012 

2 

.9999 

.980 

.060 

1.401 

2 

.9999 

.970 

.051 

.050 

.801 

3 

Half 

.9989 

1.055 

3.000 

3 

.9999 

.980 

.058 

1.300 

3 

.9999 

.971 

.049 

.052 

.790 

4 

Full 

.9999 

Composite  2 
1.061 

2.710 

4 

.9999 

.982 

.311 

2.600 

4 

.9999 

.961 

.032 

.202 

1.930 

5 

Half 

.9981 

1.059 

2.950 

5 

.9999 

.931 

.103 

2.710 

5 

.9999 

.929 

.009 

.001 

2.070 

6 

Half 

.9992 

1.072 

2.870 

6 

.9999 

.901 

.081 

2.650 

6 

.9999 

.918 

.050 

.400 

1.800 

Table  Generation 

The  ultimate  goal  of  this  effort  is  to  produce  tables  for  each  ASVAB  AFQT  composite  from  Forms  8a  through 
10b  and  to  determine  if  a  single  table  for  each  composite  is  applicable  across  the  set  of  six  forms.  The  tables  were 
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generated  by  picking  the  appropriate  smooth  curve  form  and  evaluating  it  at  each  raw  score  point  for  the  range  of 
the  AFQT  composite.  This  yielded  six  equating  tables,  one  for  each  ASVAB  form.  An  average  table  was  created 
from  these  six.  Several  deviation  indexes  were  computed  to  make  comparisons  among  these  tables  and  the 
operational  table.  These  indexes  were  the  root-mean-square  (RMS)  deviation  and  average  absolute  deviation 
(AAD).  Additionally,  the  similarity  between  classification  into  mental  categories  (see  Grunzke,  Guinn.  &  Stauffer, 
1970)  by  the  operational  table  and  the  six  form-specific  tables  was  investigated  by  computing  a  two-way  frequency 
table  of  classification. 


111.  RESl'LTS  AND  DISCI  SSION 


Data  Editing 

The  check  to  determine  if  the  correct  form  (8a  through  10b)  was  coded  produced  427  subjects  requiring 
scrutiny.  Table  4  shows  the  number  of  cases,  by  form,  which  were  identified  for  verification.  For  all  the  forms,  32 
cases  were  deleted,  51  had  form  changes,  and  344  were  left  unchanged. 


Table  4.  Number  of  Subjects  Flagged  by  Key  Verification 
by  Test  Form 


Form 

Total 

Subjects 

Not  Key  Flagged 

Key  Flagged 

8a 

2650 

2561 

89 

8b 

2529 

2477 

52 

9a 

2625 

2549 

76 

9b 

2527 

2467 

60 

10a 

2510 

2429 

81 

10b 

2438 

2369 

69 

By  way  of  example,  four  cases  displayed  in  Table  5  are  instructive.  Case  1  was  changed  to  8b  because  of  the 
low  score  on  8a  compared  to  the  high  score  on  8b.  Case  2  was  deleted  because  having  a  one  or  zero  on  all  scoring 
keys  indicated  the  examinee  was  unlikely  to  have  been  trying  very  hard.  Case  3  was  deleted  because  it  was 
impossible  to  determine  which  test  the  examinee  was  administered,  as  the  form  coded  on  the  answer  sheet  had  the 
lowest  score  of  the  six.  Case  4  wa>  kept,  despite  the  low  scores,  since  the  score  for  the  form  coded  was  the  highest. 

Table  5.  Example  Cases  from  Key  Verification  Procedures 


fane 

Form  Coded 

Scores  for  Forms 

8a 

8b 

9a 

9b 

10a 

1 0b 

i 

8a 

4 

10 

2 

1 

3 

5 

2 

8b 

1 

0 

0 

1 

1 

0 

3 

9a 

3 

4 

2 

3 

4 

3 

4 

9b 

1 

4 

3 

6 

3 

1 

The  second  data  editing  procedure  of  investigating  differences  among  types  of  testing  sites  by  comparison  of 
absolute  differences  on  AFQT-7a  and  ASVAB-AFQT  revealed  no  systematic  differences.  Consequently,  all  site 
types  were  deemed  appropriate  for  inclusion  in  the  study. 

The  third  and  final  check  was  to  investigate  the  bivariate  scatter  plots  and  standardized  residuals  devolved 
from  regressing  scores  for  each  ASVAB-AFQT  on  scores  on  AFQT-7a,  scores  on  each  AR  on  MK.  and  scores  on  each 
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NO  on  CS.  Examinees  with  standardized  residuals  outside  of  the  range  of  ±  2.S0  were  identified  for  further 
scrutiny.  Each  was  located  on  an  appropriate  scatter  plot,  and  the  score  was  deleted  if  it  was  clear  that  the  examinee 
represented  a  true  outlier  by  being  substantially  away  from  the  bulk  of  the  scatter.  It  was  observed,  for  example 
that  some  examinees  displayed  high  scores  on  AR  but  very  low  scores  on  MK.  This  is  an  illogical  situation  that 

for,by'rn*  0btained  SOme  answers  for  ,he  AR  8ub,esl’  whieh  «  in  'he  qualification  portion 
of  the  ASVAB,  but  not  for  MK.  it  night  also  be  an  indication  of  faltering  motivation  on  the  later  MK  lest.  In  either 
case,  the  examinee  should  not  be  in  the  sample.  Figure  1  shows  this  condition.  The  observations  within  the  dotted 
boundaries  were  subject  to  scrutiny  and  potential  deletion.  Only  132  subjects  were  removed  during  this  procedure, 
lne  final  sample  was  comprised  of  15,115  male  subjects. 


NUMERICAL  OPERATIONS 


Figure  l.  Scatter  Plot  of  Arithmetic  Reasoning  and 
Numerical  Operation*  Teat  Score*, 


Analysis  Sample* 

Table  6  displays  the  sample  sizes  for  each  of  the  six  male-only  samples. 


II 


Table  6.  Number  of  Subject*  by  ASVAB  Form 


Form 

Number  of  Subjects 

8a 

2,621 

8b 

2,506 

9a 

2,587 

9b 

2,500 

10a 

2,484 

10b 

2,417 

Descriptive  Statistics 

Table  7  shows  the  descriptive  statistics  for  each  of  the  ASVAB  subtests,  the  ASVAB- AF Q1 ,  and  AFQ1  -7a.  As 
can  be  seen,  the  means  (X)  differ  relatively  little,  as  do  the  standard  deviations  (<r).  Cumulative  frequency 
distributions  of  the  scores  are  of  the  same  general  shape  with  few  differences  among  them. 


Table  7.  Descriptive  Statistics  for  ASVAB  8,  9,  and  10 
and  AFQT-7a 


Sub- 

Test 

ASVAB  Form  Administered 

a 

a 

8b 

9a 

9b 

10a 

10b 

X 

<r 

X 

fj 

X 

a 

X 

<r 

X 

<r 

X 

a 

cs 

15.29 

4.83 

15.10 

4.92 

14.61 

5.51 

14.59 

5.54 

14.66 

5.09 

14.74 

5.15 

AR 

16.47 

6.76 

17.13 

7.13 

16.92 

6.96 

17.28 

6.86 

17.93 

6.70 

17.09 

6.98 

WK 

24.64 

7.55 

23.44 

7.56 

23.53 

7.66 

23.72 

7.75 

22.99 

7.82 

23.43 

7.60 

PC 

10.08 

3.38 

9.84 

3.34 

9.27 

3.48 

10.02 

3.28 

9.59 

3.77 

10.02 

3.17 

NO 

34.52 

10.17 

34.75 

10.05 

34.29 

10.58 

33.93 

10.40 

35.03 

10.04 

34.58 

10.36 

CS 

41.29 

15.04 

41.27 

15.23 

41.42 

15.05 

41.70 

14.53 

42.34 

14.84 

42.08 

14.42 

AS 

15.25 

5.82 

15.24 

5.76 

15.77 

5.77 

15.74 

5.71 

15.77 

5.65 

15.83 

5.66 

MK 

11.32 

5.54 

11.14 

5.43 

11.24 

5.46 

11.20 

5.60 

12.33 

5.33 

12.35 

5.56 

MC 

14.44 

5.43 

14.14 

5.41 

14.28 

5.33 

14.32 

5.07 

14.45 

5.25 

14.27 

5.20 

El 

11.50 

4.31 

11.46 

4.29 

11.94 

4.13 

12.05 

3.98 

12.06 

4.03 

11.75 

4.03 

VE 

34.72 

10.45 

33.28 

10.40 

32.80 

10.63 

33.73 

10.55 

32.58 

11.09 

33.46 

10.26 

AFQT 

68.69 

19.22 

68.02 

19.79 

67.10 

19.88 

68.22 

19.78 

68.27 

19.85 

68.29 

19.61 

QT-7a 

54.77 

20.80 

54.37 

20.94 

54.68 

21.02 

54.91 

21.05 

54.89 

20.77 

55.40 

20.82 

Note.  AFQT-7*  is  denoted  by  QT-7a. 


Equating 

All  of  the  AFQT  composites  were  calibrated  using  the  AFQT-7a  as  the  standard  and  were  smoothed  using 
polynomial  regression  with  the  constraint  that  the  curve  exhibit  positive  monotonicity.  This  meant  that  the  curve 
was  not  permitted  to  turn  downward,  which  would  have  provided  two  percentile  points  for  a  single  raw  score. 

Each  composite  was  calibrated  in  a  full  sample  and  two  randomly  selected  half  samples.  The  smoothing  was 
applied  to  each  subsample  independently,  and  all  three  were  used  to  decide  on  the  appropriate  smoothing  on  the 
basts  of  consistency  among  the  samples  and  reduced  standard  error  of  estimate. 

It  is  worth  noting  that  the  analytic  procedure  automatically  provides  a  measure  of  fit,  the  standard  error  of 
estimate.  Hand  smoothing,  as  used  in  previous  equating  studies  of  ASVAB-8a,  does  not  provide  such  an  index 
without  laborious  computation.  A  goodness-of-fit  of  the  equating  curve  for  the  previous  studies  was  not  assessed. 
This  is  one  of  the  drawbacks  to  the  nonanalytic  method  used  previously. 


Tables  for  the  AFQT  Forms 

The  test*  were  quite  similar  in  frequency  distribution  and  relationship  to  the  calibration  standard  of  AFQT- 
7a.  This  led  to  generally  equivalent  conversion  tables  for  all  six  forms.  Table  8  shows  the  conversions  of  each  of  the 
forms  and  the  average  correspondence  of  the  six  forms  to  the  percentile  standard  or  metric  of  AFQT-7a. 
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In  order  to  determine  if  the  ASVAB  conversion  tables  truly  differ,  measures  of  deviation  of  subject  percentile 
scores  were  computed  using  the  operational,  average,  and  form-specific  table.  These  measures  were  RMS  and  AAD 
between  pairs  of  interest.  Table  9  shows  the  RMS  and  AAD  for  the  AFQT.  Although  there  are  some  differences 
among  forms,  the  magnitudes  of  the  differences  are  quite  small.  This  is  quite  consistent  with  the  two  previous 
analyses  and  reinforces  a  picture  of  relatively  small  differences. 


Table  9.  Deviation  Measures  Comparing  llse  of  One  Versus 
Six  Conversion  Tables 


ASVAB  AFQT  Composites  8a  thru  10b 


Test  Form 


Comparison 

Pooled 

8a 

8b 

9a 

9b 

10a 

10b 

0  vs.  P 

.92 

.79 

AAD 

.83 

1.31 

.68 

.67 

.98 

0  vs.  A 

.56 

.88 

.47 

.65 

.16 

.25 

.53 

A  vs.  P 

.65 

.62 

.64 

.65 

.67 

.65 

.65 

O  vs.  P 

1.25 

1.25 

RMS 

1.24 

1.48 

.95 

1.27 

1.39 

0  vs.  A 

.87 

1.04 

.75 

1.32 

.40 

.53 

.84 

A  vs.  P 

.91 

1.88 

.90 

.91 

.92 

.92 

.92 

Note.  0  -  Optimum  or  6  tables 

P  *  Present  operational  table 
A  -  Average  of  6  tables  from  present  study 


It  should  be  noted  that  the  values  for  RMS  exceed  those  for  AAD,  indicating  that  a  few  relatively  large  errors 
(four  percentile  points  for  one  raw  score  in  AFQT)  exist.  Inspection  of  the  tables  indicates  that  these  deviations  are 
generally  limited  to  very  low  score  ranges.  This  is  probably  attributable  to  guessing  answers  to  the  test  items. 

Table  10  shows  the  deviations  across  the  five  mental  category  boundary  lines  for  the  15,115  subjects  in  the 
study.  The  comparison  in  Table  10  is  between  the  conversion  table  put  into  effect  1  October  1980  and  the  form- 
specific  tables  developed  in  the  present  study  (six  tables  in  all).  Off-diagonal  entries  are  deviations. 


Table  10.  Classification  by  Mental  Category  Based  on 
One  Versus  Six  Tables 


Category  by 

Six  Tablet 

Category  by  Operational  Table 

V 

IV 

nt 

n 

I 

V 

934 

IV 

177 

5015 

m 

121 

5199 

224 

n 

3045 

156 

i 

244 

The  proportion  of  deviations  crossing  boundaries  can  be  computed  by  dividing  the  sum  of  the  off  diagonals  by 
the  sum  of  all  the  entries;  it  is  4.5%.  In  order  to  evaluate  this  percentage,  a  similar  computation  was  done  on  the  8a 
form  alone  (not  shown).  The  comparison  was  between  the  operational  table  outcomes  and  those  from  the  specific 
table  for  8a  from  the  current  study.  The  number  of  deviations  across  category  lines  was  2.4%.  This  value  is  useful 
as  it  presents  an  estimate  of  the  expected  deviations.  Clearly  the  4.5%  representing  the  comparison  of  the  present 
table  versus  the  six  tables  is  relatively  small. 
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It  was  also  deemed  appropriate  to  investigate  the  number  of  deviations  which  were  1,  2,  3,  or  more  percentiles 
in  magnitude.  Table  1 1  shows  the  deviations  crossing  categories.  As  may  be  observed,  most  of  the  deviations  are  not 
greater  than  one  percentile  point.  Relatively  few  ever  assume  the  magnitude  of  three  percentile  points  and  none 
are  greater.  It  should  be  noted  that  for  14,437  subjects  no  deviations  were  observed. 


Table  II.  Deviation  of  Percentile  Scores  across  Category  Lines 


Site  of  Deviations 

Category 

N 

1  point 

2  point 

I  point 

IV-V 

177 

71% 

29% 

lll-IV 

121 

69% 

21% 

II-III 

224 

75% 

25% 

III 

156 

35% 

43% 

22% 

IV.  CONCLUSIONS 

Forms  8,  9,  and  10  of  ASVAB  were  found  to  be  parallel  when  equated  to  AFQT-7a,  and  a  single  conversion 
table  was  deemed  appropriate  for  operational  enlistment  processing. 
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